# Multimodal Speech Processing
Ultravox V0 4 Llama 3 1 70b
MIT
Ultravox is a multimodal speech large language model, built upon the pre-trained Llama3.1-70B-Instruct and Whisper-medium backbones, capable of simultaneously receiving both speech and text as input.
Audio-to-Text
Transformers Supports Multiple Languages

U
fixie-ai
79
4
Llama 3 Typhoon V1.5 8b Audio Preview
Typhoon-Audio Preview is a Thai and English audio-language model capable of processing text and audio inputs, with text outputs.
Audio-to-Text
Transformers

L
scb10x
218
12
Featured Recommended AI Models